Workshop on Deep Learning and Large Language Models for Knowledge Graphs

About

Over the past years there has been a rapid growth in the use and the importance of Knowledge Graphs (KGs) along with their application to many important tasks. KGs are large networks of real-world entities described in terms of their semantic types and their relationships to each other. On the other hand, Deep Learning methods have also become an important area of research, achieving some important breakthrough in various research fields, especially Natural Language Processing (NLP) and Image Recognition.

In order to pursue more advanced methodologies, it has become critical that the communities related to Deep Learning, Knowledge Graphs, and NLP join their forces in order to develop more effective algorithms and applications. This workshop, in the wake of other similar efforts at previous Semantic Web conferences such as ESWC2018 as DL4KGs and ISWC2018, ESWC2019, ESWC 2020, ISWC2021, ISWC2022, ISWC2023 aims to reinforce the relationships between these communities and foster inter-disciplinary research in the areas of KG, Deep Learning, and Natural Language Processing.

Topics of Interest

LLMs and Knowledge Graphs

Knowledge Base Construction using LLMs
Knowledge Graphs to improve the quality of LLMs
Question Answering exploiting LLMs and Knowledge Graphs (such as Retrieval Augmented Generation)
Hybrid LLMs-KG models (cross-attention, joint training,...)
Knowledge-Based fact-checking for LLMs

New Approaches for Combining Deep Learning, LLMs, and Knowledge Graphs

Methods for generating Knowledge Graph (node) embeddings
Temporal Knowledge Graph Embeddings
KGs for interoperability and Explainability
Recommender Systems leveraging Knowledge Graphs
Link Prediction and completing KGs
Ontology Learning and Matching exploiting Knowledge Graph-Based Embeddings
Knowledge Graph-Based Sentiment Analysis
Natural Language Understanding/Machine Reading
Question Answering exploiting Knowledge Graphs and Deep Learning
Approximate query answering on knowledge graphs
Trend Prediction based on Knowledge Graphs Embeddings
Learning Representations from Graphs (Graph Neural Networks, Graph Convolutional Networks, etc.)

Applications of combining Deep Learning, LLMs, and Knowledge Graphs

Domain Specific Applications (e.g., Scholarly, Biomedical, Cultural Heritage, etc.)
Applications in industry 4.0.
Knowledge Graph Alignment
Applying to real-world scenarios

Keynote Speaker

Carl Yang

Expediting Next-Generation AI for Health via KG and LLM Co-Learning

Abstract

Large language models (LLM) have brought disruptive progress to information technology from accessing data to performing analytical tasks. While demonstrating unprecedented capabilities, LLMs have been found unreliable in tasks requiring factual knowledge and rigorous reasoning, posing critical challenges in domains such as healthcare. Knowledge graphs (KG) have been widely used for explicitly organizing and indexing biomedical knowledge, but the quality and coverage of KG are hard to scale up given the notoriously complex and noisy healthcare data with multiple modalities from multiple institutions. Existing approaches show promises in combining LLMs and KGs to enhance each other, but they do not study the techniques in real healthcare contexts and scenarios. In this talk, I will introduce our research vision and agenda towards KG-LLM co-learning for healthcare, followed by success examples from our recent exploration on LLM-aided KG construction, KG-guided LLM enhancement, and federated multi-agent systems. I will conclude the talk with discussions on future directions that can benefit from further collaborations with researchers interested in data mining or biomedical informatics in general.

Bio

Carl Yang is an Assistant Professor of Computer Science at Emory University, jointly appointed at the Department of Biostatistics and Bioinformatics in the Rollins School of Public Health and the Center for Data Science in the Nell Hodgson Woodruff School of Nursing. He received his Ph.D. in Computer Science at University of Illinois, Urbana-Champaign in 2020, and B.Eng. in Computer Science and Engineering at Zhejiang University in 2014. His research interests span graph data mining, applied machine learning, knowledge graphs and federated learning, with applications in recommender systems, social networks, neuroscience and healthcare. Carl's research results have been published in 150+ peer-reviewed papers in top venues across data mining and biomedical informatics. He is also a recipient of the Dissertation Completion Fellowship of UIUC in 2020, the Best Paper Award of ICDM in 2020, the Best Paper Award of KDD Health Day in 2022, the Best Paper Award of ML4H in 2022, the Amazon Research Award in 2022, the Microsoft Accelerating Foundation Models Research Award in 2023, and multiple Emory internal research awards. Carl's research receives funding support from both NSF and NIH of USA.

Invited Speaker

Jiabin Tang

Graph Language Models

Abstract

In the realm of graph-based research, understanding and leveraging graph structures has become increasingly important, given their wide range of applications in network analysis, bioinformatics and urban science. Graph Neural Networks (GNNs) and their heterogeneous counterparts (HGNNs) have emerged as powerful tools for capturing the intricate relationships within graph data. However, despite their advancements, these models often struggle with generalization in zero-shot learning scenarios and across diverse heterogeneous graph datasets, especially in the absence of abundant labeled data for fine-tuning. Addressing these challenges, we recently introduce two novel frameworks, i.e., “GraphGPT: Graph Instruction Tuning for Large Language Models” and “HiGPT: Heterogeneous Graph Language Model”, which are designed to enhance the adaptability and applicability of graph models in various contexts. GraphGPT presents a pioneering approach by integrating Large Language Models (LLMs) with graph structural knowledge through a graph instruction tuning paradigm. This model leverages a text-graph grounding component and a dual-stage instruction tuning process, incorporating self-supervised graph structural signals and task-specific instructions. This technique enables the model to comprehend complex graph structures and achieve remarkable generalization across different tasks without the need for downstream graph data. On the other hand, HiGPT focuses on heterogeneous graph learning by introducing a heterogeneous graph instruction-tuning paradigm that eliminates the need for dataset-specific fine-tuning. It features an in-context heterogeneous graph tokenizer and employs a large corpus of heterogeneity-aware graph instructions, complemented by a Mixture-of-Thought (MoT) instruction augmentation strategy. This allows HiGPT to adeptly handle distribution shifts in node token sets and relation type heterogeneity, thereby significantly improving its generalization capabilities across various learning tasks.

Bio

Jiabin Tang is a first-year Ph.D. student majoring in Data Science at The University of Hong Kong (HKU), supervised by Prof. Chao Huang and Kao, Benjamin C.M. His research interests lie in 1) large language models and other AIGC techniques; 2) graph learning and trustworthy machine learning; 3) related deep learning applications, e.g., spatio-temporal data mining and recommendation. He has published some papers at the top international AI conferences such as KDD, SIGIR, CIKM, WWW. He is the leading author of GraphGPT (SIGIR 2024) and HiGPT (KDD 2024), as well as a co-author of LLMRec (Most Influential Paper at WSDM 2024) and UrbanGPT (KDD 2024). His GraphGPT is ranked among the top three most influential papers at the prestigious conference SIGIR 2024. It has been cited over 70 times and has garnered significant attention in the open-source community on GitHub, receiving over 510 stars.

Workshop Program and Proceedings

14:00 - 14:10 Welcome and Opening

14:10 - 16:00 Session 1

14:10 - 14:30 KGMistral: Towards Boosting the Performance of Large Language Models for Question Answering with Knowledge Graph Integration - Mingze Li, Haoran Yang, Zhaotai Liu, Mirza Mohtashim Alam, Ebrahim Norouzi,Harald Sack, Genet Asefa Gesese (paper)
14:30 - 15:20 Keynote - Carl Yang - Expediting Next-Generation AI for Health via KG and LLM Co-Learning
15:20 - 15:40 Relation Extraction for Constructing Knowledge Graphs: Enhancing the Searchability of Community-Generated Digital Content (CGDC) Collections (Short Paper) - Martin Marinov, Youcef Benkhedda, Ewan Hannaford, Marc Alexander, Goran Nenadic, Riza Batista-Navarro (paper)
15:40 - 16:00 A Few-Shot Approach for Relation Extraction Domain Adaptation using Large Language Models - Vanni Zavarella, Juan Carlos Gamero-Salinas, Sergio Consoli (paper)

16:00 - 16:30 Coffee Break

16:30 - 18:00 Session 2

16:30 - 17:10 Invited Speaker - Jiabin Tang - Graph Language Models
17:10 - 17:30 Hidden Entity Detection from GitHub Leveraging Large Language Models - Lu Gan, Martin Blum, Danilo Dessí, Brigitte Mathiak, Ralf Schenkel, Stefan Dietze (paper)

17:30 - 18:00 Closing

Submission Details

Papers must comply with the CEUR-WStemplate (single column)
Papers are submitted in PDF format via the workshop’s Open Review submission pages

Submissions can fall in one of the following categories:

Full research papers (8-10 pages)
Short research papers (4-7 pages)

Accepted papers (after blind review of at least 3 experts) will be published by CEUR–WS.

At least one of the authors of the accepted papers must register for the workshop (pre-conference only option) to be included into the workshop proceedings.